Skip to content

Conversation

@d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Aug 4, 2025

This PR will give each codec a v2 and v3 JSON de/serialization routines.

depends on #3318

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 29, 2025

I think the last thing I need to do here is write a test that ensures compatibility between these changes and older versions of zarr python 3.x.

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 30, 2025

I am concerned that the additions to the codec API in this PR will be disruptive to people who implemented custom Zarr V3 codecs, e.g.., anyone who defined a class that inherited from in zarr.abc.codec. That argues for a totally new codec API, which is not a prospect I take lightly, but I think it's the best way to avoid breaking existing workflows.

@maxrjones
Copy link
Member

I am concerned that the additions to the codec API in this PR will be disruptive to people who implemented custom Zarr V3 codecs, e.g.., anyone who defined a class that inherited from in zarr.abc.codec. That argues for a totally new codec API, which is not a prospect I take lightly, but I think it's the best way to avoid breaking existing workflows.

yeah currently in virtual-tiff I get lots of failures (119 failed, 317 passed) with most due to KeyError: 'configuration' and some due to TypeError: Invalid JSON: {'name': 'numcodecs.zlib', 'configuration': {}}.

I think this would probably be an issue for other parsers too, such as gribberish and Sean's new HRRRarser.

The VirtualiZarr tests also fail with errors such as: TypeError: Zlib.__init__() got an unexpected keyword argument 'name'.

@d-v-b
Copy link
Contributor Author

d-v-b commented Oct 3, 2025

I am concerned that the additions to the codec API in this PR will be disruptive to people who implemented custom Zarr V3 codecs, e.g.., anyone who defined a class that inherited from in zarr.abc.codec. That argues for a totally new codec API, which is not a prospect I take lightly, but I think it's the best way to avoid breaking existing workflows.

yeah currently in virtual-tiff I get lots of failures (119 failed, 317 passed) with most due to KeyError: 'configuration' and some due to TypeError: Invalid JSON: {'name': 'numcodecs.zlib', 'configuration': {}}.

I think this would probably be an issue for other parsers too, such as gribberish and Sean's new HRRRarser.

The VirtualiZarr tests also fail with errors such as: TypeError: Zlib.__init__() got an unexpected keyword argument 'name'.

This is super useful feedback. I'll add virtual-tiff as a dev dependency while I work out how to make these changes non-breaking.

@d-v-b
Copy link
Contributor Author

d-v-b commented Oct 3, 2025

yeah currently in virtual-tiff I get lots of failures (119 failed, 317 passed) with most due to KeyError: 'configuration' and some due to TypeError: Invalid JSON: {'name': 'numcodecs.zlib', 'configuration': {}}.

More context for this:

>>> from numcodecs.zarr3 import Zlib as ZlibV3
>>> from numcodecs import Zlib
>>> Zlib().get_config()
{'id': 'zlib', 'level': 1}
>>> ZlibV3().to_dict()
/Users/d-v-b/.cache/uv/archive-v0/RPIFUeEX8IUCTWZnqf1cL/lib/python3.12/site-packages/numcodecs/zarr3.py:164: UserWarning: Numcodecs codecs are not in the Zarr version 3 specification and may not be supported by other zarr implementations.
  super().__init__(**codec_config)
{'name': 'numcodecs.zlib', 'configuration': {}}
>>> ZlibV3(fake_param=10).to_dict()
{'name': 'numcodecs.zlib', 'configuration': {'fake_param': 10}}

What you see here is a massive flaw in the slapdash design of the codecs in numcodecs.zarr3, which is that __init__ does not inspect the parameters at all! Zlib is configured with a level parameter, which has a default value of 1 for numcodecs.Zlib. But numcodecs.zarr3.Zlib doesn't know anything about the codec it wraps, and so it doesn't generate the default level parameter. This means numcodecs.zarr3.Zlib generates invalid zlib metadata! Great stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants